Search CORE

58 research outputs found

Precise event sampling on AMD versus intel: quantitative and qualitative comparison

Author: Chabbi M
Kelly PHJ
Sasongko MA
Unat D
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 09/03/2023
Field of study

Precise event sampling is a profiling feature in commodity processors that can sample hardware events and accurately locate the instructions that trigger the events. This feature has been used in a large number of tools to detect application performance issues. Although precise event sampling is readily supported in modern multicore architectures, vendor supports exhibit great differences that affect their accuracy, stability, overhead, and functionality. This work presents the most comprehensive study to date on benchmarking the event sampling features of Intel PEBS and AMD IBS and performs in-depth analysis on key differences through series of microbenchmarks. Our qualitative and quantitative analysis shows that PEBS allows finer-grained and more accurate sampling of hardware events, while IBS offers richer set of information at each sample though it suffers from lower accuracy and stability. Moreover, OS signal delivery, which is a common method used by the profiling software, introduces significant time overhead to the original overhead incurred by the hardware mechanisms in both PEBS and IBS. We also found that both PEBS and IBS have bias in sampling events across multiple different locations in a code. Lastly, we demonstrate how our findings on microbenchmarks under different thread counts hold for a full-fledged profiling tool that runs on the state-of-the-art Intel and AMD machines. Overall our detailed comparisons serve as a great reference and provide invaluable information for hardware designers and profiling tool developers

Spiral - Imperial College Digital Repository

Kerncraft: A Tool for Analytic Performance Modeling of Loop Kernels

Author: C. Evans
D. Unat
J. Hammer
J. Hofmann
M. Wittmann
S. Williams
T. Grosser
Y. Lo
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 13/01/2017
Field of study

Achieving optimal program performance requires deep insight into the interaction between hardware and software. For software developers without an in-depth background in computer architecture, understanding and fully utilizing modern architectures is close to impossible. Analytic loop performance modeling is a useful way to understand the relevant bottlenecks of code execution based on simple machine models. The Roofline Model and the Execution-Cache-Memory (ECM) model are proven approaches to performance modeling of loop nests. In comparison to the Roofline model, the ECM model can also describes the single-core performance and saturation behavior on a multicore chip. We give an introduction to the Roofline and ECM models, and to stencil performance modeling using layer conditions (LC). We then present Kerncraft, a tool that can automatically construct Roofline and ECM models for loop nests by performing the required code, data transfer, and LC analysis. The layer condition analysis allows to predict optimal spatial blocking factors for loop nests. Together with the models it enables an ab-initio estimate of the potential benefits of loop blocking optimizations and of useful block sizes. In cases where LC analysis is not easily possible, Kerncraft supports a cache simulator as a fallback option. Using a 25-point long-range stencil we demonstrate the usefulness and predictive power of the Kerncraft tool.Comment: 22 pages, 5 figure

arXiv.org e-Print Archive

Crossref

Trends in Data Locality Abstractions for HPC Systems

Author: Abraham M
Bianco M
Chamberlain BL
Cledat R
Dubey A
Edwards HC
Finkel H
Fuerlinger K
Hannig F
Hoefler T
Jeannot E
Kamil A
Keasler J
Kelly PHJ
Leung V
Ltaief H
Maruyama N
Newburn CJ
Pericas M
Shalf J
Unat D
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2017
Field of study

The cost of data movement has always been an important concern in high performance computing (HPC) systems. It has now become the dominant factor in terms of both energy consumption and performance. Support for expression of data locality has been explored in the past, but those efforts have had only modest success in being adopted in HPC applications for various reasons. them However, with the increasing complexity of the memory hierarchy and higher parallelism in emerging HPC systems, locality management has acquired a new urgency. Developers can no longer limit themselves to low-level solutions and ignore the potential for productivity and performance portability obtained by using locality abstractions. Fortunately, the trend emerging in recent literature on the topic alleviates many of the concerns that got in the way of their adoption by application developers. Data locality abstractions are available in the forms of libraries, data structures, languages and runtime systems; a common theme is increasing productivity without sacrificing performance. This paper examines these trends and identifies commonalities that can combine various locality concepts to develop a comprehensive approach to expressing and managing data locality on future large-scale high-performance computing systems

INRIA a CCSD electronic archive server

eScholarship - University of California

Chalmers Research

Spiral - Imperial College Digital Repository

Chalmers Publication Library

Gender, Development, Values, Adaptation, and Discrimination in Acculturating Adolescents: The Case of Turk Heritage Youth Born and Living in Belgium

Author: A. Fuligni
A. G. Ryder
A. L. Lange
B. Tabachnick
C. Suárez-Orozco
C. Suárez-Orozco
C. Timmerman
C. Timmerman
C. Ward
C. Ward
C. Ward
D. Güngör
D. Hrubes
D. J. Hernandez
D. L. Sam
D. M. Taylor
D. Sam
Derya Güngör
E. Beurs de
E. Virta
F. M. Moghaddam
F. Motti-Stefanidi
H. Idema
H. Triandis
I. Jasinskaja-Lahti
J. Arends-Toth
J. Arends-Toth
J. Damme Van
J. P. Oudenhouven Vvan
J. S. Phinney
J. S. Phinney
J. S. Phinney
J. W. Berry
J. W. Berry
J. W. Berry
J. W. Berry
J. W. Berry
J. W. Berry
K. A. Ericsson
K. Kwak
K. L. Dion
K. Liebkind
K. Phalet
K. Phalet
K. Phalet
K. Phalet
K. Phalet
L. Hagendoorn
L. R. Derogatis
M. H. Bornstein
M. Ros
M. Rosenberg
M. Verkuyten
M. Verkuyten
Marc H. Bornstein
N. Abadan-Unat
N. Lebedeva
P. H. Ramsey
P. R. Pessar
P. Vedder
R. W. Brislin
R. W. Brislin
S. H. Schwartz
S. H. Schwartz
S. H. Schwartz
S. H. Schwartz
S. H. Schwartz
T. R. Gurr
U. Schönpflug
W. A. Arrindell
Y. A. Fijneman
Ç. Kağıtçıbaşı
Ç. Kağıtçıbaşı
Publication venue: 'Springer Science and Business Media LLC'
Publication date
Field of study

Crossref

Compiler-Driven Data Layout Transformation for Heterogeneous Platforms

Author: D. Unat
Y. Zhang
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2014
Field of study

Crossref

Dictionary-Based interpolation technique for text quality enhancement

Author: Nuno-Maganda M.A.
Petrazzuoli G.
Unat D.
Publication venue: 'Informa UK Limited'
Publication date
Field of study

Crossref

An optimized implementation of 3D seismic wave simulation with GPUs

Author: Didem Unat
Graves Robert W.
Komatitsch D.
Michéa D.
Publication venue: 'Society of Exploration Geophysicists'
Publication date
Field of study

Crossref

Precise event sampling-based data locality tools for AMD multicore architectures

Author: Chabbi M
Kelly P
Sasongko MA
Unat D
Publication venue: 'Wiley'
Publication date: 14/03/2023
Field of study

We propose COMDETECTIVE+, an inter-thread communication analyzer, and REUSETRACKER+, a reuse distance analyzer, that leverage the hardware features in AMD processors to support low-overhead profiling. Both tools employ the instruction-based sampling (IBS) facility and debug registers in AMD processors to detect inter-thread communication and data reuse. Different from prior arts, COMDETECTIVE+ differentiates the communication into true and false sharing, and REUSETRACKER+ measures reuse distance in private and shared caches by also considering cache line invalidation with low overhead. Both tools can attribute the communications and reuses to source code lines. To our knowledge these tools are two of the few profiling tools designed specifically for AMD x86 architectures using IBS. Our tools are timely and relevant considering the rise in numbers of AMD processor based data centers and HPC systems. We perform experiments to evaluate the accuracy and overheads of the proposed tools on an AMD machine with two-socket EPYC 7352 processors. COMDETECTIVE+ exhibits high accuracy while introducing 5.14× runtime and 1.4× memory overheads. REUSETRACKER+ also displays high accuracy, which is 95%, with 11.76×runtime and 1.46× memory overheads. These overheads are much lower than the overheads of existing simulators and code instrumentation-based tools. Lastly, we demonstrate the usage of the tools by having COMDETECTIVE+ and REUSETRACKER+ facilitate the code refactoring of two data mining benchmarks to improve their performance by up to 29%

Spiral - Imperial College Digital Repository

Anticoagulants in COVID-19: Is there a role for a D-dimer-driven dosing ?

Author: Caglayan P.
Damar G.
Karimov Z.
Taşbakan M. S.
Teymurlu F.
Unat D. Serce
Unat O. S.
Publication venue: European Respiratory Soc Journals Ltd
Publication date: 01/01/2022
Field of study

[No Abstract Available

Ege University Institutional Repository

Recommended from our members

Tiling as a Durable Abstraction for Parallelism and Data Locality

Author: Bell J
Chan CP
Shalf J
Unat D
Zhang W
Publication venue: eScholarship, University of California
Publication date: 18/11/2013
Field of study

eScholarship - University of California